| Host | Total | By category | ||||||
|---|---|---|---|---|---|---|---|---|
| Guides | Lists and datasets | Packages and libraries | Products | Scripts | Specifications, protocols and schemas | Standalone software | ||
| GitHub | 458 | 21 | 54 | 226 | 12 | 52 | 6 | 87 |
| Codeberg | 16 | 2 | 1 | 3 | 0 | 5 | 3 | 2 |
| GitLab | 7 | 1 | 0 | 3 | 1 | 0 | 0 | 2 |
| BitBucket | 3 | 0 | 0 | 2 | 0 | 0 | 0 | 1 |
| Launchpad | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| None | 69 | 6 | 19 | 12 | 2 | 12 | 5 | 13 |
Open archaeology, open source?
Collaborative practices in an emerging community of archaeological software engineers
Surveying the first quarter-century of computer applications in archaeology, Scollar (1999) lamented that the field relied almost exclusively on “hand-me-down” tools repurposed from other disciplines. 25 years later, this is no longer the case: computational archaeologists often find themselves practicing the dual roles of data analyst and research software engineer (Baxter et al. 2012), developing and applying new tools that are tailored specifically to archaeological problems and archaeological methods. Though this trend can be traced to the very earliest days of the field (Cowgill 1967), its most recent manifestation is distinguished by its apparently embrace of practices from free and open source software (FOSS). Most prominently, since around 2015, there has been a rapid uptake of workflow tools designed for open source development communities, such as the version control system git and associated online source code management platforms (e.g. GitHub, GitLab). These tools facilitate collaboration among developers and users of FOSS software using patterns that can diverge quite radically from conventional scholarly norms (Tennant et al. 2020).
In this paper, we investigate modes of collaboration in this emerging community of practice using open-archaeo (https://open-archaeo.info/), a curated list of 500+ active open projects in archaeology, augmented where available with data on their activity on GitHub. We apply network analysis to map the networks of collaboration that have emerged among archaeologists using GitHub, and conduct a quantitative exploratory analysis to characterize the nature and intensity of these collaborations. Our results show an uneven adoption of open source collaborative practices beyond the basic use of git as a version control system and GitHub to host source code. A majority of repositories have 1–3 contributors, with only a few projects distinguished by an active and diverse developer base. More community-oriented features, such as GitHub’s ‘issues’ and ‘comments’, are used in only a minority of repositories. Preliminary findings from ongoing work also shows how collaborative behaviours vary across different subcommunities. By analyzing interactions among archaeologists using the GitHub platform, including the ways they contribute to collective code bases, we demonstrate how archaeological software engineering is beginning to foster new kinds of collaborative commitments that blend open source protocols, DIY work ethic, conventional forms of scholarly communication, and cultural practices unique to archaeology.
1 Introduction
In their seminal vision for open archaeology, Beck and Neylon (2012, 480–81) identified the movement as comprising a series of principles and practices “predicated on promoting open redistribution and access to the data, processes and syntheses generated within the archaeological domain” with the aim of “maximizing transparency, reuse and engagement while maintaining professional probity”. Open archaeology therefore promotes more thoughtful scholarly communication practices, most notably in operations relating to publishing, data-sharing, education, and review processes. Archaeologists are also actively engaged in open source software development as means of sharing their research processes and creating tools and resources for general use.
However, academic open source has a complicated relationship with open source as practiced by professional software developers, which has its own distinct history and is framed by different objectives, challenges, and value regimes. Despite this, the open science movement, within which open archaeology emerged, draws direct inspiration from open source. For instance, the Open Knowledge Foundation (2023) publishes a widely accepted definition of “open” in the context of scholarly communication that explicitly refers to the definition of “open source” published by the Open Source Initiative (2007), an authoritative open source advocacy group. The open science movement further mimics open source by operationalizing scholarly communication through technical infrastructures and protocols that closely resemble systems and processes designed to develop open source software (e.g., the use of plain text, line-resolution version control, emphasis on formal licencing, the general hacker aesthetic). However, academic work, including the development of academic software, differs significantly from the work involved in massive open source projects that literally run the internet, such as the Linux kernel, openSSL, and the Firefox web browser. While they may use similar tools and technical protocols to manage coding operations, the open science and open source movements are governed by different social and professional warrants and interests. In other words, publishing code openly on the web has different meanings, impacts and implications for archaeologists and professional software developers (Kelty 2008 xx).
The relationship between open science and open source is also complicated by rhetorical claims that have a questionable connection to how academics actually do open source. Does academic open source actually make research processes more transparent and improve research outcomes? Is it actually boosting efficiency by establishing a common store of knowledge and productive code? Is it actually helping to foster new globe-spanning connections and lead to novel research trajectories that would not otherwise come to pass? Basically, is it more than just uploading code and data files to the internet?
Despite hopeful aspirations espoused by open archaeology advocates (see Kansa, Whitcher Kansa, and Arbuckle 2014; Kintigh et al. 2015), these outcomes are not a given. We believe that these outcomes only actually arise in contexts where participants adhere to and are motivated by warrants, professional norms, and governance strategies that encourage these results. However, practical circumstances and systemic value regimes that frame what it means to work as an archaeologist presently inhibit the potential for radical transformation, even among open science’s most ardent supporters.
That being said, archaeologists are prolific software developers. This is made evident by open-archaeo, a comprehensive list of open source archaeological software and resources that indexes XXX records as of the time of publication. But beyond simply making their code available on the web, do archaeologists also implement social strategies to advance open source ideals? Does archaeological open source actually help achieve greater transparency, sustainability, and community participation? And if not, what does it actually achieve?
This article presents a survey of archaeological software development with two goals in mind:
- we identify what kinds of software archaeologists are making;
- and evaluate how archaeologists create these tools, with particular emphasis on practices of collaboration.
Our statistical and network analyses consider how archaeological software development may be benefiting from or missing out on the affordances that open source development models provide, specifically the value added through working as part of a broader community of invested stakeholders, processes of iterative improvement, and increased code transparency. We found that the vast majority of projects lack any collective effort and activity tends to abruptly end shortly after work is initiated. Projects that are maintained by multiple contributors tend to be backed by funded initiatives who employ locally and socially engaged individuals, or are warranted by participation in a scholarly community exhibiting genuine need for a particular kind of resource. Moreover, we found that the individuals who play critical roles in supporting the archaeological open source community are precariously employed workers. Contrary to popular claims about open source being inherently distributed, resilient, and open-ended, we found that, overall, archaeological open source is actually quite centralized, fragile, and based nearly exclusively on existing professional connections and endeavours.
2 Open science and open source
2.1 Open source
Open source is a software development model that prioritizes transparent work processes. Initially driven by the idea that computer users should be free to understand and manipulate the software that they install on their computers (e.g., “free software”, as initially conceived by the Free Software Foundation), open source has become a means of collaborative software development (Kelty 2008: chapter 3, especially starting at page 99). By putting one’s code on the web without restriction on how it may be used or manipulated, this encourages creativity to flourish as people contribute to help improve the code base. Software thus emerges from the coordinated labour of worldwide volunteers, who shape the product according to the collective vision (Kelty 2008 on the emacs saga). An open code base may also be used to support alternative projects whose missions diverge from the original plan, and an entire project may be “forked”, or taken in a new direction if contributors are dissatisfied with how core developers run things.
Open source has traditionally been referred to as being based on meritocratic principles. A good test of whether a contribution should be included in a published software release is whether it is functional (Kelty 2008, 220). Moreover, with more eyes looking over a code base, it is easier to identify flaws with a contribution and flag potential bugs or security issues. This is all done in the spirit of producing functional code, and in ideal circumstances faulty contributions will be corrected before inclusion. Personal ego is minimized in favour of co-creating stable and functional outcomes (Coleman 2012; O’Neil 2009 on the Debian saga).
However, this is not the same as saying that open source is completely anarchic or based on the “wisdom of crowds”. In fact, successful open source projects incorporate complex organizational structures, governance strategies, and forms or social mediation to help delegate and vet contributions made by distributed participants (O’Neil 2009). They rely on, rather than eschew, institutional support structures, in order to motivate work, keep volunteer maintainers involved, and generally ensure that the project can be sustained over the long-term. Open source is more than just putting your code online; to be successful, it requires participation in a social experience (Kelty 2008, Ratto dissertation and derived works).
In other words, as with many so-called “soft skills” that are crucial for academic professional development, additional competencies relating to the maintenance, management, and distribution of software, such as the ability to receive and implement feedback, set and stick with long-term goals, coordinate labour, document work practices, and collaborate with others, are grossly under-appreciated factors that contribute to an open source project’s success.
We therefore consider open source to be a means of collaboration more than a means of transmitting information. It involves developing software as part of a group, developing consensus, and working with common purpose. Crucially, it also involves having a welcoming attitude, a sense of humility, and an understanding that one’s work may be appropriated and used in unanticipated ways.
2.2 Open science
The open science movement comprises a series of practices and principles intended to make research more accessible, transparent, and efficient. Although the concept of “open” is somewhat nebulous, in terms of its abstract definition and with regards to what real-world applications count as being open, one commonly-cited definition describes content that “can be freely used, modified, and shared by anyone for any purpose” (Open Knowledge Foundation 2023). This definition does not state what open is for, how to be open, or any sort of social or discursive framing behind the open movement. However, most open science advocates (including archaeologists, as elicited by Beck and Neylon (2012) and Marwick et al. (2017)) claim that they are motivated by a desire to facilitate novel research opportunities, make participation in scientific research more equitable, reclaim science as a public good, and enhance how findings are validated and legitimized.
The idea that scientists should generally contribute to a public domain of knowledge without profit motive has led to open science being heralded as revolutionary, community-oriented, and anti-capitalist means of production. However, while open science does have the potential to effect radical change, this is not a given. The social and institutional contexts in which we do science is firmly embedded within capitalist and neoliberal power structures that reward individualistic competition and do little to actually encourage equitable and accessible research practices, and as such, make it difficult to fully embrace open science ideals (Mirowski 2018). Moreover, the open science movement, which is dominated by STEM disciplines, prioritizes a grossly simplified and asocial notion of what science is and entails. Namely, it considers science as the accumulation and assembly of a species-level understanding of the world, which is not held by any one individual but is stored in seemingly value-neutral and disembodied media, facts and observations. This is manifested by digital telecommunications systems that host files, document processes, facilitate co-working opportunities, and perform automated processes. However, these systems have become so emblematic of open science that the use of these tools and resources designed to support open science is often mistaken for actually doing open science.
Open science is typically compared with the open source movement in that they both involve a distributed, digitally-mediated and worldwide labour force, who somehow derive rough consensus directed towards assets held in the public domain (Tennant et al. 2020). But they differ in terms of the contexts in which they operate, the stakeholders involved, and the kinds of outcomes they produce. Whereas open source emerged from concern of consumer rights and then developed as a means of maintaining resilient and collectively motivated projects, open science comes out of a desire to make research practices more transparent and accessible. Open source is performed by professional and hobbyist software developers alike, and participants contribute in a wide variety of ways (including: programming, writing documentation, translating software and documentation, bug reporting, and financial support), but, in open science, scientists are usually the only participants actively involved in creating and maintaining contributions. Moreover, whereas open source projects often attract participants with varied stakes in the software and use cases in mind, open science projects are typically bounded by small communities of specialists with very particular needs, and rarely anticipate use of their work beyond the limited scope of their primary targeted audience. Additionally, open science is bounded by the professional contexts in which science operates, and as such, produces outputs that can be easily credited to specific sets of individuals for reasons of tenure and promotion. Open science projects whose contributions are supported by research funding also face sustainability concerns, as participants lose motivation to contribute once funding runs out. Once a project is completed, papers have been published, and credit has been allocated, it is common for scientists to mark their projects as finished and move on to new endeavours. Open source projects, on the other hand, are motivated by a more practical need for the software to function properly in perpetuity, and contributors may remain actively or sporadically involved to satisfy users’ needs, or to direct users to derivative and functional forks of abandoned software.
The adoption of open source development models among archaeologists is generally informed by the broader open science movement, which is motivated by a genuine desire to facilitate novel research opportunities, to make participation in scientific research more equitable, to reclaim science as a public good, and to enhance the means of validating findings. However, the predominant concern with implementing best tools to use, adopting optimal data processing pipelines, and tying into global, web-based infrastructures, protocols and standards (cf. Kansa, Whitcher Kansa, and Arbuckle 2014; Kintigh et al. 2015; Roosevelt et al. 2015) distract from fundamental tensions and contradictions regarding the actual value of working in the open. For instance, Faniel et al. (2013, 299–301), Atici et al. (2013, 676–77), Huggett (2018), Sobotkova (2018), and Opitz et al. (2021) demonstrate that to make the reuse of archaeological data feasible and useful in a practical sense, it is necessary to re-introduce social friction that these infrastructures are designed to eliminate. In other words, the pressures and circumstances of being an archaeologist and doing archaeological research assert themselves when attempting to make practical use of these infrastructures, and therefore must be accounted for in their design and implementation. Here we aim to identify similar sources of dissonance with regards to the promise, potential, and actual implementation of open source software development models among archaeologists.
3 Data and methodology
Our study comprises an exploratory analysis of open-archaeo (Batist and Roe 2023), a list of open source archaeological software and other digital resources. As of right now it includes 408 items, which are rendered for the web at open-archaeo.info. We compiled the dataset by browsing collaborative software development platforms such as GitHub, GitLab and Codeberg, and by tracing connections to other personal, professional, and institutional websites that describe and host additional archaeological software. This entailed manually crawling through users’ profiles, particularly those who identify as archaeologists or who have contributed to tools that pertain to archaeological work. We supplemented this quasi-systematic collection strategy with word-of-mouth contributions made by interested individuals who identified relevant work that we initially overlooked.
Open-archaeo is a relatively comprehensive list, however it generally overlooks records pertaining to code written before archaeologists started using collaborative software development platforms such as GitHub. It also lacks references to code that is not available on the web. Open-archaeo is also limited by the experiences of its primary maintainers, and we welcome anyone, especially domain specialists who are familiar with the kinds of tools commonly used in their specific fields, to contribute.
For the purpose of this study, we augmented the core open-archaeo dataset with data obtained from GitHub’s public API. That is, for each project in open-archaeo with a GitHub repository, we added information on individual contributors, commits, issues, etc. from the API using the R package gh (Csárdi et al. 2023). This does reduce our sample size slightly; about an eighth of projects don’t use version control, at least not publicly, but among those that do, GitHub is by far the most popular host (Table 1).
While our initial intention was to only list open source software, open-archaeo’s scope has expanded to include all software created by and for archaeologists.
The data and analysis code are available in our research compendium on GitHub, which is also indexed on Zenodo.
4 Open archaeology
As of writing, open-archaeo catalogues 553 resources created by and for archaeologists. This primarily constitutes software but also includes various forms of open documents. Table 2 summarizes the kinds if resources that appear in open-archaeo, and breaks them down into more precise categories.
| Category | Scope | n |
|---|---|---|
| Software | ||
| Packages and libraries | Sets of functions assembled with clear purpose, and made accessible using standards established by an underlying platform. | 246 |
| Standalone software | Software that may be operated without needing to first access an underlying platform. | 104 |
| Scripts | Sets of pragmatically assembled mutable functions, often lacking complete documentation or adherence to protocols that would otherwise facilitate secondary use outside their original contexts of creation. | 69 |
| Documents | ||
| Lists and datasets | A series of consistently organized observations assembled with purpose. | 75 |
| Guides | An educational resource or documented protocol meant to instruct readers how to apply relevant tools or techniques. | 30 |
| Products | Stable outcomes of creative work. | 15 |
| Specifications, protocols and schemas | A formal data structure or framework intended to be used as a model. | 14 |
Most resources (57%) included in open-archaeo are designed to be used atop an existing “platform” – for example a package that extends a programming language or a plugin for an application. The designers of this code are basically creating additional functions within the base platform that are useful for archaeological purposes. Others create standalone software that can be run independently of such platforms, for example desktop or web apps. A significant number of projects also comprise of datasets and non-packaged code snippets that have been made available for general use.
| Platform | n |
|---|---|
| R | 209 |
| Python | 55 |
| QGIS | 16 |
| Mobile app | 10 |
| MATLAB | 6 |
| ArcGIS | 3 |
| LibreOffice Calc | 3 |
| Microsoft Excel | 3 |
| Nextflow | 3 |
| Open Data Kit | 2 |
| Other | 7 |
38% of projects are extensions to the statistical programming language R, making it the most widely-used ‘platform’ by a large margin (Table 3). Python, another programming language, is also relatively popular , as are plugins for the open source geographic information system QGIS . Beyond that, there is a rather fragmented landscape of plugins for other desktop software (e.g. AutoCAD, ArcGIS), a number of lesser used programming languages, and a genre consisting of custom forms and spreadsheet templates. Many of these are targeted by only one or two developers; the larger platforms tend to be more diverse.
At first glance, the relative popularity of R versus Python is perhaps surprising; Python is regularly ranked as the most popular programming language in the world, with R a distant runner-up. However, it accords with the popularity of R as a tool for data analysis in archaeology (Schmidt and Marwick 2020) and other scientific disciplines (Lai et al. 2019).
We also annotated each record with ‘tags’ that describe aspects of archaeological work that each tool contributes to Figure 1. The most common tags unsurprisingly deal with work that naturally benefits from advanced information processing afforded by computers, such as statistical analysis, sample calibration, geographical analysis, data management, and chronological modelling. Educational resources and practical guides are also well represented due to the web’s usefulness as a medium for sharing and communication.
When we compare categories with tags, we see the general domains that each kind of resource is designed to serve. We see that packages are fairly common across the board. Tags that are notable for having a higher proportion of standalone software include archaeogenetics, data management, 3D modelling, photogrammetry, drivers and IO, and simulations or agent based modelling. These tools may require greater access to system resources, or may require more complex user interfaces than are more complex than what R or Python IDEs (integrated development environments) tend to provide.
We found that many repositories do not explicitly denote an open source license, or any license at all Table 4.
| License | N |
|---|---|
| None detected | 262 |
| GNU General Public License v3.0 | 122 |
| MIT License | 56 |
| Other | 52 |
| GNU General Public License v2.0 | 22 |
| Creative Commons Zero v1.0 Universal | 11 |
| Apache License 2.0 | 8 |
| GNU Affero General Public License v3.0 | 5 |
| BSD 3-Clause "New" or "Revised" License | 3 |
| Creative Commons Attribution 4.0 International | 3 |
| The Unlicense | 3 |
| Creative Commons Attribution Share Alike 4.0 International | 2 |
| Artistic License 2.0 | 1 |
| CeCILL Free Software License Agreement v2.1 | 1 |
| GNU Lesser General Public License v2.1 | 1 |
| Mozilla Public License 2.0 | 1 |
Archaeological software development activity has increased significantly over the years. Figure 2 shows the cumulative growth of code contributions committed and pushed to GitHub repositories, and the number of GitHub repositories that host archaeological software and resources.
As we can see, archaeologists have been using git from even before GitHub was launched in 2008. But use of git really began to take off around 2014–2015, when we see an uptick in the rate of growth that has continued ever since.
Around this time we also see that GitHub starts being used to host documents and scripts. This may represent a recognition of GitHub’s ability to track things other than code, and a willingness to experiment with version control systems as a medium for disseminating work in an open and somewhat nerdy way.
5 Collaborative practices
As well as hosting source code, GitHub and other software forges include systems for facilitating collaboration on code and other projects. The basic collaborative workflow is inherited from git, which allows multiple users to commit (see ?@tbl-glossary for definitions of this and other git terminology used in this section) code to the repository. A user with commit access to a repository can change any of its contents at will, so this is usually reserved for the project maintainer and known, trusted collaborators. GitHub extends this model with its pull request feature, by which any user can offer to contribute code to a repository to which they don’t have commit access. The maintainer may then choose to merge (accept) or decline the pull request, facilitating contributions from a wider network of collaborators without the need for permission to be sought in advance.
We measured the lifespan of a repository as the time between the first and latest commit, and its activity as the rate of commits. Here therefore we refer to the development lifespan of a project, which is not necessary related to its use-life. By these metrics, the lifespan and activity of repositories in open-archaeo vary greatly (Figure 3). The average project lasts 1043 days with 0.76 commits per day. Many projects are active for only a short period of time: about 16% less than 30 days, 24% less than 90 days, and 37% less than a year. However, the vast majority (all but 4) do have more than one commit, suggesting that use of GitHub as a pure host for already-finished projects is not common; some degree of iteration, if not collaboration, is almost always present. The longest-lived projects have been active for between english(round(lifespan_top10_range / 365)[1]) and english(round(lifespan_top10_range / 365)[2]) years. The most active projects see up to round(max_commit_rate) commits per day, but the majority of repositories (84%) receive less one commit per day.
The interaction between project longevity, activity, and number of contributors is multifaceted Figure 4. Highly active projects (one commit per day or more) tend to be either very long-lived or very short-lived; few fall in the centre of the distribution. Short-lived projects tend to be characterised by a ‘spree’ of activity (a high commit rate), while long-lived projects have a broader range of activity profiles. The most “successful” projects according to open source norms (i.e. long-lived and active) are with few exceptions those projects with the largest contributor base in our dataset. However, the modal project in the centre of the distribution is more modest, lasting around three years, maintained as by an individual or a small group, with around three commits per month.
Perhaps as a result of this, we also find that only a minority of projects make much use of collaborative features of GitHub such as issues and pull requests (left) and comments (right). Only 39% of repositories make use of these features. Of these repositories with any issues and pull requests, 28% have only 1 and 82% have less than 10. However, of all repositories that have any comments posted in reply to issues and pull requests, there is actually a significant degree of activity. This seems to suggest that people are willing to respond to issues raised by others, but may not necessarily want to take initiative, for a variety of reasons that may be more effectively investigated through qualitative analysis of the conversations in the issue threads.
GitHub also facilitates collaboration on broader project management tasks, primarily through its issues feature.1 Unless a repository’s maintainer specifically configures it otherwise, any user can create an issue attached to another user’s repository, or comment on an existing issue. Issues are typically used to log and track bug reports, feature requests, and other comments and suggestions from the project’s user base. GitHub’s pull request feature is also implemented via this system – a pull request is a special type of issue.
- How many people are involved?
- Conversely, how many projects are individual people involved in?
Perhaps unfortunately then, we also find that nearly 60% of projects have only a single contributor, and projects with more than a dozen are very rare indeed [<3%]. A “contribution”, in this case, is a commit, issue, or comment applied by a user to a repository.
Furthermore, when we break down the contributions made by all contributors in multi-contributor projects, we also see that work tends to be distributed very unevenly. Of multi-contributor repositories, 92% have a third of contributions made by a single person, 65% have half of all contributions made by a single person, and 36% have three quarters of all contributions made a single person.
Another way GitHub users can engage with repositories and other users with social media-like features such as starring a repository, adding it to a public or private list, or following a user. These actions populate a timeline of through which users can see recent activity and discover new projects related to those they have interacted with in the past.2 While not constituting a direct form of collaboration, these features can facilitate the formation and maintenance of collaborative networks, in the same way that other social media platforms serve other professional networks.
- Summarise: overall, what is the level of uptake of GitHub’s collaboration features?
- Has it changed over time? (?@fig-collaboration-ts)
The prototypical collaborative open source project on GitHub has a core group of developers (often a single maintainer) that regularly commit new code, a wider network of collaborators that contribute through pull requests, and an active user base that create and comment on issues, who have indicated their support for the project by starring its repository. The data we collected on open-archaeo repositories on GitHub show that uptake of its collaboration features is highly uneven, with very few projects resembling the open source ideal. The majority of projects are short-lived, with few contributors and a small number of commits. Only a fraction of repositories use GitHub’s extended collaboration features (pull requests, issues, comments) and those that do use them in a limited way. Conversely, a small number of projects () do appear to fit the aspirational open source model, with a large number of contributors maintaing a long-lived and highly-active repository. Archaeologists active on GitHub also make markedly greater use of GitHub’s social media-like features (stars, following) than more direct collaborative actions.
6 An emerging community of practice?
We constructed and analyzed networks representing users’ contributions to GitHub repositories to examine the emergence of collaborative tendencies. We sought to identify clusters of repositories that share common sets of contributors, and the different kinds of contributions that draw the network together.
We started by constructing a graph connecting users to the repositories that they contributed to, accounting for commits, issues, and comments as distinct kinds of relations (Figure 6). We then extracted two one-mode networks from this, one connecting repositories by common users, and the other connecting users by common repositories.
6.1 repo-repo
We applied the edge-betweenness community detection method (Girvan and Newman 2002) to identify clusters of repositories that share common sets of contributors of all kinds. Aside from isolate nodes (n = XXX, not visible in our visualizations), which represent repositories with only a single contributor, we detected 21 clusters (?@fig-graph-repo-repo-edge-betweenness-dendrogram). While many of these clusters are interconnected, some discrete components containing between 2-7 repositories appear as distinct from a primary core.
This graph identifies three core clusters and several peripheral clusters. The core clusters are characterized by repositories whose contributors commit to projects other than their own. The other peripheral clusters largely correspond with the work of single individuals, and sometimes also their close colleagues. Peripheral clusters that are connected to core clusters by only a few relationships represent the sole (or perhaps initial) integration of lone developers into a broader community.
| Repository | Betweenness Centrality | Category | Tags | Commits |
|---|---|---|---|---|
| zackbatist/open-archaeo | 12478.619 | Lists and datasets | Lists | 323 |
| benmarwick/ctv-archaeology | 7602.978 | Lists and datasets | Lists | 621 |
| ropensci/c14bazAAR | 4995.064 | Packages and libraries | API interfaces and web scrapers | 1031 |
| ISAAKiel/oxcAAR | 2456.627 | Packages and libraries | Radiocarbon dating, calibration and sequencing; Chronological modelling | 246 |
| ISAAKiel/mortAAR | 2168.148 | Packages and libraries | Zooarchaeology | 439 |
| tesselle/tabula | 2105.234 | Packages and libraries | Statistical analysis; Seriation | 626 |
| ahb108/rcarbon | 2072.377 | Packages and libraries | Radiocarbon dating, calibration and sequencing | 881 |
| archaeology/archaeology | 1971.925 | Lists and datasets | Lists | 25 |
| Chronomodel/chronomodel | 1771.484 | Standalone software | Chronological modelling | 1515 |
| ropensci/neotoma | 1460.293 | Packages and libraries | API interfaces and web scrapers | 809 |
The three core clusters have their own distinct character. One has a focus on archaeogenetics, which consists of a very well established collaborative network and general reliance on data modelling and processing tools. A second cluster is mostly centred on fieldwork-oriented data collection tools, and particularly the emergence of well-funded and well-supported dominant platforms that attract more attention than other independent projects scattered across the network. The third and most significant cluster includes a schmorgasborg of projects whose contributors share varied interests. The emphasis in this latter cluster is on the formation of a central software development community rather than on any specific topic of work. Many of the projects represented in this third cluster emerge from underlying professional partnerships, namely research labs (e.g. ISAA-Kiel) and special interest groups (SSLA).
6.2 user-user
We assembled graphs linking users based on common contributions to the same repositories. As with the repo-repo network, we excluded isolate nodes (n = xxx), which represent users who only contributed to their own repositories, from the visualization.
We sought to identify whether users who contribute in certain distinct ways play different roles in the overall network. We are reluctant to share personal information about specific users without their informed consent, but based on our knowledge of the community we found that the people with the highest betweenness values are those who primarily produce computational archaeology code as their job. Moreover, we found that these people tend to be employed under precarious circumstances. Although precarious employment is part of our sad reality, in the context of developing and maintaining open source software, this presents a serious source of risk. The people who occupy central positions in these networks are crucial community members that make the network whole, and if they are either unable to continue on in their contributions or decide to leave archaeology entirely, then the overall network would fragment.
We also applied the same betweenness centrality algorithm on a subnetwork whose links are based only on issues and comments, and not code contributions. In this subnetwork, people who commit less code have higher betweenness scores. However, many of the people with high betweenness from the graph representing all contributions also appear here. These people who appear in both of these lists have a tendency to contribute as both committers and as commenters. This list also includes a series of contributors who never or rarely commit code. Although it is out of this study’s scope a qualitative analysis of issues and comments may yield more insight on the kinds of contributions that each of these participants make.
7 Conclusion
Archaeology is a “magpie” discipline, using tools and methods that originate from a variety of other disciplines and industries to better understand the past. Moreover, archaeologists are accustomed to working on shoestring budgets, having to adapt their tools to suit specific, non-generalizable situations, and applying a DIY work ethic to create and co-creating with others to get the job done (Batist et al 2021; Morgan and Eve 2012; Caraher 2019). It is therefore unsurprising that archaeologists are prolific open source software developers.
In this study, we documented an emerging community of open source software engineers in archaeology and assessed the extent to which collaboration within this community resembles conventional ‘open source’ forms. We operationalised ‘open-source collaborative practices’ as the use of certain features of git and GitHub visible to us in data from the GitHub API. This excludes a wide range of other potential practices, most obviously the use of other software (though as we demonstrate in section X.X, the git/GitHub combination is overwhelmingly the most popular), but also collaboration through offline or private channels, or forms of collaboration we don’t know about. We have not directly interviewed those doing the collaborating, though our conclusions do draw heavily from our experience as members of that community ourselves. Our earliest data is from XXX and our study can say little about collaborative software development in archaeology before this point, though we know there was a significant amount of it ((ducke2013?) 10.4324/9781315431932-12; (whallon1972?) 10.1007/BF02403759).
These caveats notwithstanding, we have documented that open source software development in archaeology has seen a rapid and sustained rise beginning around 2014. This is marked by a variety of applications and use cases, including the use of git and GitHub to track and host content other than code. Moreover, archaeologists are very involved in broader scripting ecosystems, as is evident through the predominant creation of R packages and Python libraries designed to process the rich variety of archaeological information. At the same time, archaeologists also create standalone software for more intensive tasks that require greater access to system resources or that warrant more complex user interfaces than what R and Python IDEs are capable of providing.
Looking at the distribution of tags, we note that, among items listed as software, repositories tend to be focused on various means of identifying distribution patterns (spatial, temporal, statistical), calibrating data obtained from various instrumental methods (XRF, luminescence dating), supporting specialized finds analysis (zooarchaeology, palaeobotany, archaeogenetics), and supporting the collection and processing of archaeological materials. These foci signify gaps in the archaeological toolbox that archaeologists recognized, and have attempted to fill, on their own terms. In future studies, it would be interesting to examine how the purposeful expansion of open source tools corresponds with broader methodological trends apparent in publishing patterns or through other expressions of archaeological interest (e.g. social media posts, conference sessions).
While there is an emerging community of practice around open source in archaeology, we observed that collaboration remains limited. Most work is performed individually and is short-lived. The vast majority of repositories have 1–3 contributors, with only a few distinguished by an active and diverse developer base. Our analysis also shows an uneven use of git and GitHub’s extended features, beyond their basic usage as a version control system and repository host. Generally speaking, we believe that this is because people do not want to step on other people’s shoes by raising issues or intruding on other people’s projects. This may relate to the fact that most developers on this list are academics who hold different values relative to the designers of open source development environments, regarding how collaboration should occur, for example, when dealing with how projects and ideas are ‘owned’ by individuals or communities, and how work should be iteratively improved upon.
However, we seem to have entered a period of critical reflection on the practical ramifications of infrastructures developed to support open science. For instance, Huggett (2022) identified social frictions as a necessary aspect of productive research, which open science infrastructures effectively seek to eliminate. Numerous other studies by Batist (2023), Huvila (2022), Haciguzeller et al. (2021), Opitz et al. (2021) Sobotkova (2018), Wylie (2017) Atici et al. (2013), Faniel et al. (2013) (among others) highlighted the need to account for the work involved in preparing and making use of materials shared openly on the web, and which are effectively muted by infrastructures developed in pursuit of open science.
Our network analysis similarly draws attention to the real-world collaborative ties that underpin archaeological open source software development. We identified three clusters of close associations between projects: the first is centred around a highly specialized supdiscipline exhibiting a very enthusiastic attitude (ZB NOTE: or maybe even pressure?) toward working in the open; the second is concerned with developing solutions to a very common and discipline-wide problem, specifically relating to data collection and management in fieldwork; and the third deals with several topic of concern, but represents an emerging community of practice. This third cluster comprises efforts performed by members of a close-knit community, many of whom already collaborate at local research centres or participate in relatively private interest groups. This indicates that open source is firmly embedded within existing social and institutional support and power structures that permeate academic life, both online and offline.
References
Footnotes
Apart from issues, GitHub has a very wide range of project management and social media-like features, including wikis, discussion forums and ‘kanban’ boards. We have not analysed the use of these features here.↩︎
This feature of GitHub’s timeline was one of the primary ways we compiled open-archaeo.↩︎